60 research outputs found

    Similarity-Driven Cluster Merging Method for Unsupervised Fuzzy Clustering

    Get PDF
    In this paper, a similarity-driven cluster merging method is proposed for unsupervised fuzzy clustering. The cluster merging method is used to resolve the problem of cluster validation. Starting with an overspecified number of clusters in the data, pairs of similar clusters are merged based on the proposed similarity-driven cluster merging criterion. The similarity between clusters is calculated by a fuzzy cluster similarity matrix, while an adaptive threshold is used for merging. In addition, a modified generalized objective function is used for prototype-based fuzzy clustering. The function includes the p-norm distance measure as well as principal components of the clusters. The number of the principal components is determined automatically from the data being clustered. The performance of this unsupervised fuzzy clustering algorithm is evaluated by several experiments of an artificial data set and a gene expression data set.Singapore-MIT Alliance (SMA

    Hierarchical Multi-Bottleneck Classification Method And Its Application to DNA Microarray Expression Data

    Get PDF
    The recent development of DNA microarray technology is creating a wealth of gene expression data. Typically these datasets have high dimensionality and a lot of varieties. Analysis of DNA microarray expression data is a fast growing research area that interfaces various disciplines such as biology, biochemistry, computer science and statistics. It is concluded that clustering and classification techniques can be successfully employed to group genes based on the similarity of their expression patterns. In this paper, a hierarchical multi-bottleneck classification method is proposed, and it is applied to classify a publicly available gene microarray expression data of budding yeast Saccharomyces cerevisiae.Singapore-MIT Alliance (SMA

    The Modular Organization of Protein Interactions in Escherichia coli

    Get PDF
    Escherichia coli serves as an excellent model for the study of fundamental cellular processes such as metabolism, signalling and gene expression. Understanding the function and organization of proteins within these processes is an important step towards a ‘systems’ view of E. coli. Integrating experimental and computational interaction data, we present a reliable network of 3,989 functional interactions between 1,941 E. coli proteins (∼45% of its proteome). These were combined with a recently generated set of 3,888 high-quality physical interactions between 918 proteins and clustered to reveal 316 discrete modules. In addition to known protein complexes (e.g., RNA and DNA polymerases), we identified modules that represent biochemical pathways (e.g., nitrate regulation and cell wall biosynthesis) as well as batteries of functionally and evolutionarily related processes. To aid the interpretation of modular relationships, several case examples are presented, including both well characterized and novel biochemical systems. Together these data provide a global view of the modular organization of the E. coli proteome and yield unique insights into structural and evolutionary relationships in bacterial networks

    Expanding the Landscape of Chromatin Modification (CM)-Related Functional Domains and Genes in Human

    Get PDF
    Chromatin modification (CM) plays a key role in regulating transcription, DNA replication, repair and recombination. However, our knowledge of these processes in humans remains very limited. Here we use computational approaches to study proteins and functional domains involved in CM in humans. We analyze the abundance and the pair-wise domain-domain co-occurrences of 25 well-documented CM domains in 5 model organisms: yeast, worm, fly, mouse and human. Results show that domains involved in histone methylation, DNA methylation, and histone variants are remarkably expanded in metazoan, reflecting the increased demand for cell type-specific gene regulation. We find that CM domains tend to co-occur with a limited number of partner domains and are hence not promiscuous. This property is exploited to identify 47 potentially novel CM domains, including 24 DNA-binding domains, whose role in CM has received little attention so far. Lastly, we use a consensus Machine Learning approach to predict 379 novel CM genes (coding for 329 proteins) in humans based on domain compositions. Several of these predictions are supported by very recent experimental studies and others are slated for experimental verification. Identification of novel CM genes and domains in humans will aid our understanding of fundamental epigenetic processes that are important for stem cell differentiation and cancer biology. Information on all the candidate CM domains and genes reported here is publicly available

    Transfer-free, lithography-free and fast growth of patterned CVD graphene directly on insulators by using sacrificial metal catalyst

    Get PDF
    Chemical vapor deposited graphene suffers from two problems: transfer from metal catalysts to insulators, and photoresist induced degradation during patterning. Both result in macroscopic and microscopic damages such as holes, tears, doping, and contamination, translated into property and yield dropping. We attempt to solve the problems simultaneously. A nickel thin film is evaporated on SiO2 as a sacrificial catalyst, on which surface graphene is grown. A polymer (PMMA) support is spin-coated on the graphene. During the Ni wet etching process, the etchant can permeate the polymer, making the etching efficient. The PMMA/graphene layer is fixed on the substrate by controlling the surface morphology of Ni film during the graphene growth. After etching, the graphene naturally adheres to the insulating substrate. By using this method, transfer-free, lithography-free and fast growth of graphene realized. The whole experiment has good repeatability and controllability. Compared with graphene transfer between substrates, here, no mechanical manipulation is required, leading to minimal damage. Due to the presence of Ni, the graphene quality is intrinsically better than catalyst-free growth. The Ni thickness and growth temperature are controlled to limit the number of layers of graphene. The technology can be extended to grow other two-dimensional materials with other catalysts

    Generation and analysis of a mouse intestinal metatranscriptome through Illumina based RNA-sequencing

    Get PDF
    With the advent of high through-put sequencing (HTS), the emerging science of metagenomics is transforming our understanding of the relationships of microbial communities with their environments. While metagenomics aims to catalogue the genes present in a sample through assessing which genes are actively expressed, metatranscriptomics can provide a mechanistic understanding of community inter-relationships. To achieve these goals, several challenges need to be addressed from sample preparation to sequence processing, statistical analysis and functional annotation. Here we use an inbred non-obese diabetic (NOD) mouse model in which germ-free animals were colonized with a defined mixture of eight commensal bacteria, to explore methods of RNA extraction and to develop a pipeline for the generation and analysis of metatranscriptomic data. Applying the Illumina HTS platform, we sequenced 12 NOD cecal samples prepared using multiple RNA-extraction protocols. The absence of a complete set of reference genomes necessitated a peptide-based search strategy. Up to 16% of sequence reads could be matched to a known bacterial gene. Phylogenetic analysis of the mapped ORFs revealed a distribution consistent with ribosomal RNA, the majority from Bacteroides or Clostridium species. To place these HTS data within a systems context, we mapped the relative abundance of corresponding Escherichia coli homologs onto metabolic and protein-protein interaction networks. These maps identified bacterial processes with components that were well-represented in the datasets. In summary this study highlights the potential of exploiting the economy of HTS platforms for metatranscriptomics

    Comparison of substrate specificity of the ubiquitin ligases Nedd4 and Nedd4-2 using proteome arrays

    Get PDF
    Target recognition by the ubiquitin system is mediated by E3 ubiquitin ligases. Nedd4 family members are E3 ligases comprised of a C2 domain, 2–4 WW domains that bind PY motifs (L/PPxY) and a ubiquitin ligase HECT domain. The nine Nedd4 family proteins in mammals include two close relatives: Nedd4 (Nedd4-1) and Nedd4L (Nedd4-2), but their global substrate recognition or differences in substrate specificity are unknown. We performed in vitro ubiquitylation and binding assays of human Nedd4-1 and Nedd4-2, and rat-Nedd4-1, using protein microarrays spotted with ∼8200 human proteins. Top hits (substrates) for the ubiquitylation and binding assays mostly contain PY motifs. Although several substrates were recognized by both Nedd4-1 and Nedd4-2, others were specific to only one, with several Tyr kinases preferred by Nedd4-1 and some ion channels by Nedd4-2; this was subsequently validated in vivo. Accordingly, Nedd4-1 knockdown or knockout in cells led to sustained signalling via some of its substrate Tyr kinases (e.g. FGFR), suggesting Nedd4-1 suppresses their signalling. These results demonstrate the feasibility of identifying substrates and deciphering substrate specificity of mammalian E3 ligases

    Genetic Interaction Maps in Escherichia coli Reveal Functional Crosstalk among Cell Envelope Biogenesis Pathways

    Get PDF
    As the interface between a microbe and its environment, the bacterial cell envelope has broad biological and clinical significance. While numerous biosynthesis genes and pathways have been identified and studied in isolation, how these intersect functionally to ensure envelope integrity during adaptive responses to environmental challenge remains unclear. To this end, we performed high-density synthetic genetic screens to generate quantitative functional association maps encompassing virtually the entire cell envelope biosynthetic machinery of Escherichia coli under both auxotrophic (rich medium) and prototrophic (minimal medium) culture conditions. The differential patterns of genetic interactions detected among >235,000 digenic mutant combinations tested reveal unexpected condition-specific functional crosstalk and genetic backup mechanisms that ensure stress-resistant envelope assembly and maintenance. These networks also provide insights into the global systems connectivity and dynamic functional reorganization of a universal bacterial structure that is both broadly conserved among eubacteria (including pathogens) and an important target

    Image database organization in a content-based image retrieval system

    No full text
    Content-based retrieval of visual information is an emerging technology that extends traditional information retrieval to data repositories containing visual information, such as images or videos. The technologies behind this involve aspects of signal processing, computer vision, machine learning and information retrieval.Doctor of Philosophy (EEE
    corecore